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SUMMARY 


THE  PROBLEM 

Human  performance  test  methodologies  for  use  in  environmental 
research  are  being  developed  at  the  Naval  Biodynamics  Laboratory. 

Repeated  measures  on  the  same  subjects  are  used  almost  exclusively  in 
this  and  many  other  intervention  studies  (e.g.,  drug  and  clinical). 
Suitable  tasks,  experimental  paradigms  and  statistical  tools  are  required 
to  insure  the  value  of  repeated  measure  investigations. 


FINDINGS 


Research  tools  are  described  which  are  applicable  to  repeated 
measures  of  human  performance.  In  the  first  section,  statistical 
criteria  for  tasks  are  delineated,  tools  for  assessment  are  described, 
and  examples  of  applications  are  given.  In  the  second  section,  multiple  ' 
subject  and  single  subject  analyses  of  intervention  experiments  are 
considered  with  major  focus  on  the  methodological  tools.  The  final 
section  summarizes  these  tools  with  examples  of  their  application. 
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INTRODUCTION 


Investigations  of  vibration  and  other  environmental  effects  almost  ex¬ 
clusively  employ  repeated  measures  of  subjects  according  to  Kennedy  and 
Bittner  (1977).  The  general  approach  in  such  studies  is  to  collect  data 
on  one  or  more  trials  conducted  Before  (B) ,  During  (D)  and  After  (A)  exposure. 
Evaluation  of  the  suitability  of  tasks  for  repeated  measurements  and  analysis 
methodologies  will  be  the  concern  of  this  report. 

Selection  of  a  repeated  measures  paradigm  follows  from  both  theoretical 
and  practical  considerations.  First,  interest  in  the  time  course  of  devel¬ 
opment  of  and  recovery  from  environmental  effects  frequently  dictates  repeat¬ 
ed  measures.  Time-course  measurements  of  a  single  individual  or  team  during 
an  environmental  experiment,  for  example,  may  be  expected  to  reveal  features 
of  response  to  environmental  change  which  would  not  be  observed  if  a  com¬ 
posite  of  several  individuals,  each  measured  at  different  times  were  employed 
(Estes,  1956).  In  addition,  it  would  be  impracticable  to  study  the  time- 
course  of  effects  with  Independent  groups  due  to  the  prohibitive  numbers  of 
subjects  which  would  be  required.  Other  reasons  for  advocating  the  use  of 
repeated  measures  are  the  increased  measurement  sensitivity  and  economical 
features  of  such  experiments  (Fisher,  1935,  1966;  Sutcliffe,  1980;  Winer, 

1971).  Individual  differences  in  subjects  may  be  removed  under  appropriate 
repeated  measures  designs,  but  remain  part  of  the  "error"  in  independent 
groups  designs.  Figure  1  is  a  nomogram  which  illustrates  the  impact  of 
sample  size  (N)  and  correlaton  between  measures  on  the  minimum  significant 
(j>»  .05)  differences  (D)  for  one  and  two-tailed  tests  (Carter,  Kennedy, 

&  Bittner,  1981).  Measures  for  independent  groups,  by  definition,  would 
have  an  expected  R  =  0;  while,  repeated  measures  would  generally  yield  R>0. 
Assuming  a  fixed  N,  the  change  in  "sensitivity  (D)"  with  increasing  R  from 
independence  (R  -  0)  to  complete  dependence  (R  -  1)  can  be  seen  to  be  quite 
large.  Similarly,  with  D  fixed,  the  "economy"  of  repeated  measures  can  be 
seen  by  noting  the  reductions  in  N  for  repeated  (R>0)  verses  independent 
(R  =  0)  groups.  The  last  and  often  most  potent  argument  for  repeated 
measures  is  the  requirement  to  reduce  subject  risk  in  hazardous  environments. 
Increased  economy  through  use  of  repeated  measures  implies  reduced  subject 
risk  with  fewer  subjects  and  numbers  of  exposures  required  for  a  given  level 
of  sensitivity  in  addition  to  reduced  financial  costs.  The  reduction  of  sub¬ 
ject  risk  and  other  considerations  have  led  to  the  adoption  of  repeated 
measures  experimentation  in  this  laboratory. 

The  requirement  for  Before-During-Af ter  (BDA)  experimentation  has  moti¬ 
vated  this  laboratory  to  develop  applicable  tasks  and  methodologies.  One 
project  is  underway  to  evaluate  performance  test  suitability  for  repeated 
measures  applications  (Kennedy  &  Bittner,  1977;  Carter,  Kennedy,  &  Bittner,. 
1980;  Kennedy,  Carter,  &  Bittner,  1980;  Shannon,  Carter,  &  Boudreau,  1981). 

A  second  project,  focusing  on  the  application  of  Box-Jenkins  (1970)  Time-Series 
methodology  to  BDA  experiments,  is  nearing  completion;  findings  from  this  pro¬ 
gram  have  already  evidenced  considerable  promise  for  this  approach  (Carter, 
1980;  Glass,  Wilson,  &  Gottman,  1975).  In  addition  to  these  projects,  others 
are  underway  which  are  directed  primarily  at  the  effects  of  impact,  vibration. 


This  was  identified  as  the  Performance  Evaluation  Tests  for  Environmental 
Research  (PETER)  Program  in  earlier  reports. 
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and  motion  sickness  on  performance.  Requirements  for  environmental  research, 
as  part  of  these  programs,  has  driven  developments  involving  BDA  design  and 
analysis.  For  example,  recent  vibration  experiments  have  been  conducted  with 
an  aim  to  developing  a  methodology  applicable  to  long-term  investigations 
(Guignard,  Bittner,  &  Carter,  1981).  Altogether,  this  laboratory's  program- 
atic  efforts  have  resulted  in  the  assembly  of  a  "bag  of  research  tools"  which 
are  of  value  for  repeated  measures  investigations. 

The  purposes  of  this  report  are  twofold.  In  the  first  section,  statis¬ 
tical  criteria  for  tasks  to  be  repeatedly  measured  are  delineated,  and 
examples  are  given  of  desirable  and  undesirable  tasks.  The  second  section 
focuses  on  analysis  of  intervention  experiments  including  both  multiple  sub¬ 
ject  experiments  and  single  subject  analysis.  The  last  section  summarizes 
the  "bag  of  research  tools"  described  in  the  earlier  sections. 

TASK  SELECTION  FOR  REPEATED  MEASURES 
Statistical  Criteria 


Candidate  tests  for  repeated  measures  studies  should  meet  rigorous 
statistical  qualification  (Jones,  1972,  1980;  Kennedy  &  Bittner,  1977; 
Kennedy,  et  al . ,  1980).  Meaningful  repeated  measures,  as  outlined  by  Jones 
(1972,  1980),  generally  require  that  means,  variances,  and  intertrial  correl¬ 
ations  are  "well-behaved"  when  obtained  under  constant  (baseline)  conditions. 
Baseline  conditions,  identified  by  Kennedy  and  Bittner  (1977,  1980)  for  per¬ 
formance  tests,  typically^involve  daily  administration  of  tasks  to  (15-20) 
subjects  for  15  workdays.  Assessment  of  tasks  across  days  permits  assessment 
of  task  differential  changes  with  practice,  which  are  uncontaminated  by  within- 
day  autocorrelative  effects  (Campbell  &  Stanley,  1966;  Thorndike,  1949). 
Unambiguous  assessment  of  differential  change  with  practice  was  deemed  neces¬ 
sary  because  of  the  substantial  evidence  for  such  change  (cf.,  Alvares  &  Hulin, 
1972).  When  such  changes  are  occurring,  it  is  difficult  to  establish  "what  is 
being  measured"  and  to  make  scientific  generalizations  (Bittner,  1979;  Jones, 
Kennedy,  &  Bittner,  1981).  Specific  baseline  condition  statistical  character¬ 
istics  which  are  considered  necessary  are  described  below. 

Means .  The  criterion  for  means  is  that  they  change  in  a  linear  manner 
or  are  unchanging  over  trials.  This  criterion  has  been  identified  by  Campbell 
and  Stanley  (1966)  as  a  requirement  for  interpretation  of  repeated  measures 
results.  Significantly,  it  is  unnecessary  that  this  criterion  be  met  from 
the  first  trial,  if  practice  is  carried  out  beyond  a  point  where  it  is  obtain¬ 
ed  before  beginning  a  cycle  of  BDA.  Such  a  point  in  practice,  it  is  note¬ 
worthy,  is  expected  with  sufficient  practice  (Reynolds,  1952;  Fitts  &  Posner, 
1967).  Hence  in  task  evaluations,  means  are  tested  sequentially,  dropping 
leading  days,  until  this  criterion  of  linearity  is  met. 

Statistical  techniques  for  accomplishing  means  analysis  include  graph¬ 
ical,  analysis  of  variance  (AN0VA) ,  and  orthogonal  polynomial  analyses.  The 
BMDP2V  (Dixon  &  Brown,  1977)  computer  program,  with  option  ORTHOGONAL, 
provides  a  direct  and  rigorous  analysis. 


The  need  for  other  supplementary  baseline  (e.g.,  within  day)  investiga¬ 
tions  was  noted  but  not  developed  by  these  authors. 
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Variances .  The  criterion  for  withln-trial  variances  Is  that  they  are 
homogeneous  over  trials.  This  criterion,  in  addition  to  constant  intertrial 
correlations,  constitutes  compound-symmetry,  the  traditional  assumption  for 
simple  repeated  measures  ANOVA  (Box,  1950;  ScheffA,  1959;  Winer,  1971).  As 
with  the  means,  it  is  unnecessary  that  this  criterion  be  met  from  the  first 
trial  if  practice  is  carried  out  beyond  the  point  where  the  criterion  is 
obtained.  Thus,  in  task  evaluation,  variances  also  are  tested  sequentially, 
dropping  leading  days,  until  the  criterion  is  met.  Statistical  techniques 
for  accomplishing  this  analysis  include  graphical  and  a  multitude  of  analytic 
tests.  Where  the  normality  assumption  holds,  familiar  statistical  analyses 
(e.g.,  Fmax)  may  be  employed  for  this  purpose;  these  are  extremely  sensi¬ 
tive  to  nonnormality  (ScheffA,  1959,  Chapter  10).  Alternate  analyses  are 
suggested  where  normality  is  questionable,  including,  Scheff^'s  (1959)  log- 
transformed  variance  or  related  Miller's  Jackknife  analyses  (Hollander  & 

Wolfe,  1973).  The  exact  procedure  for  establishing  homogeneity  of  variance 
is  less  important  than  its  unambiguous  establishment. 

Correlations .  The  criterion  for  the  cross-day  correlations  is  that  they 
are  differentially  stable  (constant).  As  with  the  criteria  for  homogeneous  var 
lances  described  above,  the  differential  stability  criterion  is  embedded  in  the 
traditional  (or  compound  symmetry)  requirement  for  simple  repeated  measures 
ANOVA.  Differential  stability  and  homogeneity  of  variance,  in  addition  to  thei 
implications  for  ANOVA,  are  sufficient  indications  that  the  Spearman-Brown 
Formula  may  be  applied  to  estimating  the  reliability  of  a  test  with  changes  in 
test  length  (Thorndike,  1949;  Winer,  1971).  Figure  2  (Kennedy  et  al .  1980) 
shows  the  tradeoff  of  reliability  and  time;  It  provides  a  method  of  assess¬ 
ing  the  length  of  testing  required  for  a  reliability  found  desirable  from 
consideration  of  Figure  1.  Differential  stability,  most  importantly,  implies 
that  the  same  attribute  is  being  measured  on  each  occasion  of  measurement. 

With  attribute  changes,  statistical  testing  may  be  possible,  but  attribution 
of  effect  and  scientific  generalization  are  precluded  (Jones,  et  al.  ,  1981). 

Statistical  tests  for  differential  stability  have  been  of  continuing 
concern.  In  an  earlier  paper,  Bittner  (1979)  reviewed  and  illustrated 
graphical  and  analytical  methods  which  were  applied  in  early  task  investiga¬ 
tions.  More  recently,  the  method  of  Steiger  (1980a,  1980b)  has  been  routine¬ 
ly  applied  for  stability  determination.  Other  methods  which  have  captured 
interest  include  possible  applications  of  factor  analysis,  nonparametr ic 
directional  tests,  and  jackknife  approaches  (e.g.,  Joreskog,  1969;  Shannon, 
1980;  Jones,  1981;  Gnanadesikan ,  1977).  However,  because  of  the  omnibus 
character  of  the  Steiger  analysis  and  its  computer  implementation,  it  has 
continued  to  be  recommended.  It  has  been  possible  to  test  sequentially  for 
differential  stability  by  manually  dropping  leading  days.  This  procedure, 
it  is  noteworthy,  was  supported  by  early  work  of  Jones  (1970a,  1970b,  19721 
in  which  differential  stability  was  found  to  emerge  with  practice.  The 
recent  development  of  a  nonmanual  stepwise  program  for  Steiger  (1980a,  ^980b) 
analysis  gives  added  support  for  the  standard  use  of  this  analysis. 

Overall,  the  task  criteria  described  above  lead  to  straightforward  exper¬ 
imental  design,  simplicity  of  statistical  analysis  and  unarabigious  interpre¬ 
tation  of  results.  Augmentation  of  these  criteria  with  others 


Regarding  this  computer  program,  LCDR  Robert  C.  Carter  can  be  contacted 
at  the  Naval  Biodynamics  Laboratory,  Box  29407,  New  Orleans,  LA.  70189. 
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may  be  anticipated  for  applications  within  days,  where  autocorrelative 
effects  may  be  anticipated  (c.f.,  Thorndike,  1949;  Campbell  &  Stanley, 
1966).  In  particular,  investigations  employing  Box-Jenkins  (1970)  models 
may  be  required;  they  will  be  described  as  part  of  the  second  section  of 
this  report.  Pertinently,  the  baseline  criteria  described  above  also 
support  the  assumptions  required  for  within  day  investigations.  Tasks 
which  have  been  evaluated  using  the  statistical  criteria  are  summarized 
in  other  reports  (Kennedy,  et  al. ,  1980;  Kennedy,  1981). 

Tasks  Evaluation  Examples 
« 

Two  evaluations  of  the  statistical  suitability  of  tasks  will  be  given 
to  illustrate  applications  of  the  above  criteria.  The  first  evaluation  is 
of  the  Spoke  Control  Task,  a  motor  dexterity  task  which  was  considered  as 
part  of  a  larger  investigation  (Bittner,  Lundy,  Kennedy,  &  Harbeson,  in 
press)  .  This  task,  which  successfully  met  the  statistical  criteria  given 
above,  was  recently  employed  in  an  investigation  of  vibration  effects 
(Guignard,  et  al . ,  1981).  The  second  example  is  a  time  estimation  measure 
which  has  been  shown  to  be  unsuitable  for  repeated  measures  applications 
(McCauley,  Kennedy,  &  Bittner,  1980).  Together  these  examples  give  illus¬ 
trations  of  task  success  and  failure. 

Spoke  Task.  Computer  generated  paper-and-penc  Control  Task  (CT) 
forms  were  produced  and  printed  by  a  programmed  WANG  Computer  on  unlined 
display  sheets.  The  display  sheets  (43  cm  x  28  cm)  contained  32  circular 
targets  arranged  concentrically  around  a  central  circular  target  (marked 
0) .  Each  target  was  9.3  mm  in  diameter  and  located  120.6  mm  from  the  central 
target.  Distance  from  the  center  of  one  target  to  an  adjacent  target  was 
25.4  mm.  The  number  "1"  was  in  the  twelve  o'clock  position  and  began  an 
ascending  sequence  in  a  clock-wise  direction.  Each  of  18  enlisted  male 
volunteers  was  required  alternately  to  tap  his  stylus  on  the  center  target 
(0)  and  on  each  of  the  numbered  circles  (1,2,..., 32)  in  succession  (0,1; 

0,2;  ...;  0,32).  Errors,  if  any,  were  corrected  as  they  were  observed. 

The  Cf  score  was  the  time  to  completion  as  measured  by  a  stop-watch. 
Subjects  were  tested  daily  for  15  consecutive  workdays  Monday  through 
Friday  between  0800  and  1000. 

Figure  3  shows  the  means  and  standard  deviations  for  the  CT  over 
days.  A  slow  linear  decline  in  the  means  is  suggested,  but  no  change  is 
seen  in  the  standard  deviations.  The  overall  change  in  means  was  con¬ 
firmed  by  analysis  of  variance  (ANOVA)  with  £(14,  238)  =  3.54;  £<.01. 

Of  the  overal  1  sums  of  squares,  55%  was  accounted  f<^r  by  a  very  highly 
significant  linear  component,  £(1,238)  =  27.3  (£<10  ),  with  no  signif¬ 

icant  indication  of  higher  order  components,  £(13,  238)  =  1.7  (£>.06). 

The  apparent  lack  of  change  in  standard  deviations  was  also  confirmed  by 
a  nonsignificant  Ftaax(15,17)  =  2.92  (£>.l).  Hence,  the  CT  means  and 

variances  were  stable  from  the  first  day. 

Table  1  contains  the  CT  reliability  coefficients  across  all  days 
from  which  the  correlation  traces  in  Figure  4  were  drawn.  The  traces 
shown  in  this  figure  were  drawn  for  selected  Base  Days  (1,  2,  4,  8,  10, 
and  12)  by  left  justifying  the  appropriate  row  of  the  correlation  matrix 
in  terms  of  days  after  base  performance  (Bittner,  1979).  Examining  Figure 
4,  it  can  be  noted  that  subsequent  to  Base  Day  l,  the  traces  are  level  and 
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Table  1:  SPOKE  CONTROL  TASK  RELIABILITIES  OVER  15  DAYS  (N-=  18) 
(Bittner,  Lundy,  Kennedy,  &  Rarbeson,  in  press) 


DAYS  1  2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

1  .72 

.77 

.69 

.71 

.61 

.65 

.65 

.69 

.69 

.71 

.73 

.57 

.63 

.75 

2 

.88 

.87 

.85 

.85 

.72 

.86 

.79 

.84 

.85 

.85 

.86 

.79 

.92 

3 

.87 

.82 

.79 

.72 

.79 

.67 

.84 

.70 

.77 

.67 

.69 

.88 

4 

.85 

.81 

.70 

.82 

.76 

.85 

.77 

.86 

.81 

.81 

.90 

5 

.85 

.83 

.85 

.81 

.91 

.89 

.91 

.84 

.61 

.88 

6 

.89 

.87 

.89 

.84 

.91 

.85 

.78 

.77 

.89 

7 

.78 

.84 

.78 

.83 

.80 

.69 

.58 

.80 

8 

.88 

.86 

.86 

.81 

.83 

.79 

.90 

9 

.83 

.88 

.86 

.79 

.76 

.84 

10 

.82 

.88 

.85 

.64 

.91 

11 

.90 

.82 

.77 

.87 

12 

.86 

.75 

.88 

13 

.68 

.83 

14 


.77 
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Figure  4.  Comparison  Spoke  Control  Task  (CT)  Reliabilities  between  Selected  Base  Days 
(1,  2,  4,  8,  10,  12)  and  Those  Following  over  15  Days  (Bittner,  Lundy,  Kennedy, 
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overlapping.  This  pattern  Indicates  that  reliabilities  are  differentially 
stable  subsequent  to  the  first  session  (Bittner,  1979;  Jones,  1980).  A 
statistical  test  of  differential-stability  using  the  approach  of  Steiger 
(1980a,  1980b),  however,  yielded  PC 2  ( 104)  =  103.8  (£>.49)  which  indicated 
a  constant  (r  »  .799)  correlation  even  from  the  first  session.  Conserva¬ 
tively,  the  CT  is  both  differentially  stable  and  has  high  task  definition 
subsequent  to  the  first  session. 

Time  Estimation.  Constant  Error(CE)  ,  a  global  estimation  measure, 
was  considered  as  part  of  the  larger  McCauley,  et  al .  (1980)  investigation. 
Daily  scores  for  each  of  19  enlisted  men  were  derived  from  5  productions 
of  8  time  intervals  (2,  3,  5,  6,  8,  9,  11  and  12  seconds)  without  the  sub¬ 
jects'  knowledge  of  results.  On  each  of  15  weekdays,  the  40  trials  (8 
intervals  by  5  replications)  were  given  in  random  order.  A  subject's  daily 
CE  was  his  mean  deviation  from  the  specified  intervals  over  trials. 

CE  means,  variances  and  cross  day  correlations  were  analyzed  subse¬ 
quent  to  data  collection.  Figure  5  shows  the  means  and  standard  devia¬ 
tions  and  suggests  little  change  with  practice.  A  repeated  measures 
ANOVA,  it  is  noteworthy,  yielded  F(14,252)  =  0.90  (£>.56)  for  Days.  The 
cross  day  correlation  results,  given  in  Table  2  and  illustrated  in  Figure 
6,  are  dramatically  different.  Examining  this  figure,  it  can  be  noted 
that  the  reliability  across  Day  1  and  Day  2  is  0.80  but  that  the  relia¬ 
bilities  between  Day  1  and  succeeding  days  falls  effectively  to  zero. 

The  average  reliability  between  immediately  adjacent^  days  (r^  ,  £9  , 

...,  _r  j^)  can  be  computed  from  Table  1  to  also  be  _r  =  0.80.  ’Howevdr, 
as  seen  In  Figure  3  the  fall-off  pattern  with  succeeding  days  continues 
and  can  be  seen  as  late  as  Day  12.  Even  if  stable  beyond  this  point,  the 
more  than  three  hours  practice  required  would  make  this  task  unattractive 
for  repeated  measures  research.  McCauley,  et  al .  (1980)  also  found  such 
instability  for  a  variety  of  other  time  estimation  global  measures,  trans¬ 
formed  measures,  and  subtask  scores.  Certainly  the  results  did  not  contra¬ 
dict  Posner's  (1978)  view  that  there  is  no  general  time  estimation  trait. 

ANALYSIS  OF  REPEATED-MEASURES  EXPERIMENTS 

This  section  of  the  paper  describes  some  tools  for  design  and  analysis 
of  repeated-measures  experiments  and  techniques  for  multiple-subject  and 
single-subject  experiments. 

Multiple-Subject  Experiments 

The  most  commonly  analyzed  effect  of  motion  and  vibration  is  a  change 
of  mean  performance.  In  an  experiment  which  Includes  measurements  Before 
(B) ,  During  (D) ,  and  After  (A)  the  treatment  with  no  carry  over  of  treat¬ 
ment  into  A,  the  contrast  (D  -  (B  +  A) /2)  represents  the  mean  effect  of 
the  treatment,  independent  of  the  mean  effect  of  practice  which  is  repre¬ 
sented  by  A  -  B.  Figure  7  illuminates  these  constructs  which  respectively 
are  identical  with  the  quadratic  and  linear  orthogonal  polynomials  for 
trends  in  the  three  repeated  measures  B,  D,  and  A.  The  BMD2V  (Dixon  & 
Brown,  1977)  computer  program,  described  earlier,  may  be  used  to  calculate 
statistical  tests  of  these  contrasts  and  their  interactions  with  other 
contrasts  yet  to  be  discussed. 


(McCauley,  Kennedy,  &  Bittner,  1980). 
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Table  2:  CONSTANT  ERROR  (CE)  RELIABILITIES  OVER  15  DAYS  (N  =  19) 
(McCauley,  Kennedy,  &  Bittner,  1980) 


DAYS  1  2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

1  .80 

.40 

-.14 

.08 

-.04 

.16 

.08 

.03 

-.12 

-.19 

-.05 

-.21 

-.26 

-.26 

2 

.59 

.22 

.34 

.28 

.44 

.40 

.30 

.14 

.07 

.16 

-.05 

-.02 

-.07 

3 

.67 

.73 

.49 

.54 

.37 

.20 

.09 

.12 

.16 

.12 

.06 

.03 

4 

.70 

.69 

.65 

.53 

.38 

.28 

.25 

.27 

.28 

.19 

.12 

5 

.80 

.65 

.62 

.55 

.38 

.32 

.42 

.37 

.36 

.28 

6 

.83 

.87 

.82 

.63 

.57 

.57 

.52 

.55 

.37 

7 

.79 

.70 

.61 

.53 

.61 

.53 

.46 

.39 

8 

.94 

.80 

.75 

.72 

.57 

.66 

.47 

9 

.84 

.73 

.72 

.54 

.62 

.46 

10 

.76 

.90 

.82 

.78 

.78 

11 

.75 

.61 

.70 

.54 

12 

.88 

.84 

.83 

13 

.89 

.96 

14 


.90 
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DAYS  AFTER  BASE  PERFORMANCE 

Figure  6.  Comparison  of  Constant  Error  (CE)  Reliabilities  between 
Selected  Base  Days  (1,  2,  4,  8,  10,  12)  and  Those 

Following  (McCauley,  Kennedy,  &  Bittner,  1980), 
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TREATMENT  AND  PRACTICE  EFFECTS 


90 


After 


Before 

e-'' 


70 


Practice  Effect 


Treatment  Effect 


BO 


During 


Figure  7.  Example  of  Treatment  and  Practice  Effects. 
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If  a  second  experiment  is  conducted  employing  the  same  subjects  to 
produce  measurements  B',  D',  and  A’  then  the  treatment  and  practice  mean 
effects  in  the  two  experiments  can  be  compared  directly.  This  comparison 
may  be  made  because  the  treatment  and  practice  contrasts  are  Independent 
of  the  level  of  performance  B  +  D  +  A  or  B '  +  D’  +  A’ ,  which  would  tend 
to  increase  with  practice  from  the  first  to  the  second  experiment.  In 
Figure  8  the  independence  of  the  contrasts  (D  -  (B  +  A)/ 2)  can  be  seen 
across  four  sequential  experiments  at  8,  16,  32,  and  8  Hz  vibration  con¬ 
ditions.  The  8  Hz  conditions  show  statistically  consistant  decrements  in 
the  first  and  last  experiments  while  other  conditions  show  no  effects. 
These  data  were  obtained  in  a  successful  application  of  this  approach 
where  experiments  were  typically  separated  by  intervals  of  several  weeks 
(Guignard,  et  al . ,  1981). 

In  addition  to  effects  on  the  means,  the  treatments  may  also  affect 
the  variances  as  illustrated  in  Figure  9.  Changes  of  variances  should  be 
considered  both  for  behavioral  interpretations  and  for  validation  of 
assumptions  underlying  the  analyses  of  effects  on  means.  In  terms  of 
behavior,  variances  during  the  treatment  may  decrease  if  the  treatment 
causes  the  subjects  to  adopt  a  stereotyped  response,  or  prevents  them 
from  responding.  Variances  may  increase  if  the  treatment  affects  subjects 
to  varying  degrees.  Other  phenomena  may  also  alter  the  variance  of  per¬ 
formance,  and  any  inhomogeneity  of  variance  raises  questions  about  the 
validity  of  many  techniques  for  assessing  mean  effects.  The  literature  of 
statistics  abounds  with  tools  for  comparing  variances;  at  least  one  of 
them  should  be  in  an  experimenter's  tool  bag. 

Inter trial  correlations  should  be  examined  for  evidence  of  changes 
in  the  performance  standings  of  the  subjects  relative  to  each  other. 
Changes  of  the  correlations  can  be  tested  with  Steiger's  MULTICORR  computer 
program  (1980a,  1980b).  If  correlations  between  treatment  (D)  and  base¬ 
line  (A  and  B)  scores  are  lower  than  correlations  between  baselines,  then 
subjects  were  not  all  equally  affected  by  the  treatment,  nor  was  the 
effect  linearly  related  to  baseline  scores.  Figure  10  gives  a  hypothet¬ 
ical  example  of  changes  in  correlations  with  environmental  impact.  These 
results  would  be  expected  if  the  treatment  disrupts  the  abilities  typically 
employed  on  a  task  ao  that  subjects  alter  their  test-taking  strategy.  In 
general,  intertrial  correlations  represent  the  degree  of  consistency  in 
subjects’  responses  to  the  treatment.  If  the  correlations  change,  then 
the  experimenter  is  alerted  to  an  inconsistant  effect. 

Even  if  the  intertrial  correlations  are  relatively  constant  there 
are  three  different  types  of  effects  of  the  treatment  which  could  be 
happening  (Bittner,  1981).  Performance  during  the  treatment  (D)  could 
differ  from  baseline  performance  ( (B  +  A)/2)  by  an  additive  constant,  by 
a  multiplicative  constant,  or  by  a  combination  of  these  as  shown  in  the 
upper  part  of  Figure  11.  The  former  type  of  effect  indicates  that  all 
subjects  were  affected  equally  by  the  treatment.  The  latter  types  of 
effects  could  occur  if  the  treatment  affected  the  top  performers  more  (or 
less)  than  others  as  illustrated  in  Figure  11.  Analysis  of  covariance 
would  be  an  appropriate  tool  to  use  if  these  latter  types  of  effects  were 
occurring.  With  the  tools  discussed  in  the  preceeding  paragraphs  of  this 
section,  an  experimenter  can  construct  answers  to  many  questions  about 
his  results.  For  instance,  what  was  the  mean  effect  of  the  treatment? 
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CHANGES  OF  VARIANCE  IN  A 
REPEATED  MEASURES  EXPERIMENT 


I 

Before  During  After 


Figure  9.  Illustration  of  Changes  of  Variance. 
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SUBJECTS’  RESPONSES  FOR 
CHANGED  INTERTRIAL  CORRELATIONS 


Before  During  After 


B  D  A 

1.0  -.1  .9 

1.0  -.1 
1.0 


Figure  10.  Illustration  of  Subjects  Responces  for  Changed 
Intertrial  Correlations. 
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Was  the  effect  consistant  for  ail  subjects?  Was  the  effect  proportional 
to  baseLine  performance?  Was  the  variability  of  performance  changed  by 
the  treatment?  How  does  the  effect  of  one  treatment  compare  with  the 
effect  of  another?  Ordinarily,  these  questions  have  not  all  been  con¬ 
sidered,  perhaps  because  appropriate  tools  were  not  at  hand. 

Experiments  with  a  Single  Subject  or  Team 

It  is  possible  to  obtain  much  valuable  information  from  data  on 
single  subjects.  If  more  than  one  subject  were  available,  comparisons 
between  results  for  each  subject  indicate  the  generalizability  of  the 
results.  Tools  applicable  for  analyzing  single-subject  data  are  presented 
in  detail  by  Box  and  Jenkins  (1970)  and  Glass,  Wilson,  and  Gottman  (1975). 
Collectively,  these  tools  constitute  an  approach  to  analysis  of  "time 
series."  They  assume  a  series  of  at  least  50  observations  at  approximately 
equal  intervals  of  time  representing  a  process  which  has  some  unchanging 
statistical  properties  described  by  the  references  cited.  The  criteria 
for  evaluating  the  stability  of  means,  variances,  and  intertrial  correlations 
described  earlier  provide  a  basis  for  making  the  statistical  assumptions  of 
time-series  analysis. 

Time-series  tools  can  be  used  to  infer  changes  of  mean  level,  slope, 
variance,  or  even  more  subtle  characteristics  of  the  subiect's  responses. 
Furthermore,  the  dynamics  of  the  response  to  treatments  can  be  studied 
without  the  loss  of  fidelity  caused  by  aggregating  several  subjects'  data. 
Time-series  methods  can  also  be  used  to  study  cycles  of  behavior,  to  ex¬ 
amine  feedback  among  several  variables,  or  to  forecast  performance  in  the 
future.  Generally,  the  time-series  methods  discussed  in  this  report 
consist  of  finding  a  stochastic  model  for  the  data. 

For  example.  Glass,  Wilson,  and  Gottman  (1975)  offer  a  research 
tool  for  showing  whether  the  variance  of  a  time  series  changes  in  response 
to  a  treatment  intervention.  First  a  model  is  fit  to  the  series,  then 
that  model  is  applied  separately  to  data  from  before  and  after  the  inter¬ 
vention.  The  ratio  of  the  residual  variances  from  these  two  applications 
of  the  model  is  a  statistic  with  an  F  distribution  when  there  has  been  no 
change  in  variance.  By  comparing  an  empirical  statistic  with  a  table  value 
of  the  F  distribution,  it  is  possible  to  determine  whether  a  detectable 
change  of  variance  was  associated  with  the  treatment  intervention. 

Time-series  analysis  also  includes  tools  for  investigating  changes 
of  the  level  of  a  series  of  observations  from  before  to  after  a  treatment. 
Time-series  analysis  goes  beyond  the  usual  tests  of  mean  effects  because 
it  also  characterizes  how  the  level  of  the  series  changed  over  time.  This 
branch  of  time-series  analysis  is  called  intervention  analysis  (Box  &  Tiao, 
1975) .  Intervention  analysis  can  be  used  for  response  curves  deemed  margin¬ 
ally  or  totally  uninterpretable  with  traditional  methods  of  repeated-measures 
analysis  (Campbell  &  Stanley,  1966).  Responses  that  are  delayed,  gradual, 
oscillating,  or  show  other  dynamic  forms  can  be  accommodated. 

Finally,  it  is  possible  that  a  treatment  alters  the  dynamics  of  a 
series  of  performance  measurements.  That  is,  the  form  of  the  dependency 
of  present  responses  on  past  responses  may  change.  To  test  for  this  event¬ 
uality,  a  stochastic  model  is  fit  to  the  observations  made  before  the 
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treatment.  The  same  model  Is  applied  to 
treatment.  The  auto-correlations  (Box  & 
from  modeling  the  observations  after  the 
formu la: 


Q  =  T (T+2) 


the  observations 
Jenkins,  1970) 


£i 


made  after  the 
of  the  residuals 


treatment  are  combined  using  Box's 


(1) 


where  k  (  >■  20)  is  the  number  of  autocorrelations,  T  is  the  number  of 

observations,  and  p  is  the  number  of  parameters  in  the  time-series  model. 

If  the  dynamics  of  the  two  series  are  the  same,  Q  is  chi-square  distributed 
with  k  -  P  degrees  of  freedom.  Hence,  if  Q  is  stat istical ly  significant, 
then  a  change  of  the  dynamics  of  performance  is  indicated.  This  tool  might 
be  used,  for  example,  to  determine  whether  the  form  of  neurophysiological 
evoked  potential  measurements  is  altered  by  exposure  to  impact. 


SUMMARY  OF  THE  BAG  OF  RESEARCH  TOOLS 


Table  3  is  an  inventory  of  the  bag  of  research  tools  that  has  been 
assembled  for  repeated  measurements.  In  the  left  hand  column,  an  appli¬ 
cation  of  each  tool  is  given.  Tools  are  described  in  the  right  hand  column. 
This  bag  of  tools,  as  with  most,  is  incomplete  and  contributions  are  welcome. 


Table  3:  A  Tool  Bag  for  Repeated  Measurements 


APPLICATION 


TOOL 


Evaluate  a  task's  suitability 
for  repeated  measurements 

Widely  distributed  (e.g.,  daily) 
measurements  made  in  standard 
conditions  (e.g.,  Kennedy  & 
Bittner,  1980) 

Check 

trend 

for  stability  of  means  (linear 
or  no  trend  with  practice) 

Repeated  measures  ANOVA  with 
orthogonal  trend  analysis  (e.g. , 
Dixon  &  Brown  (1977)  BMDP2V) 

Check  for  stability  (homogeneity) 
of  variances 

Analytic  tests  (e.g.,  Fmax)  for 
equality  of  variances  (Hollander, 
Wolfe,  1973;  Scheff4,  1939;  Winer 
1971) 

Check  for  differential  stability 
of  intertrial  correlations 

Steiger  (1980a,  1980b)  CORRMAT 
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Table  3  Continued 


Represent  effects  of  treatments.  Experimental  designs  involving  one 

practice  within  experiments,  and  measurement  Before  (B) ,  During 

practice  between  experiments  (D) ,  and  After  (A)  the  treatment. 

Contrasts:  D-(B  +  A) /2  represents 
treatment  effects;  B-A  represents 
within-experiment  practice;  and 
(B1  +  A')  -  (B  +  A)  represents 
practice  effects  between  conditions. 
These  contrasts  are  merely  linear  and 
quadratic  trends  in  ANOVA.  (Guignard , 
Bittner,  &  Carter,  L 98 1 ) 


Represent  treatment  effects  in 
which  the  effect,  D  -  (B  +  A) /2 
has  a  non-unitary  proportional 
component  relative  to  (B  +  A) /2 


Analysis  of  Covariance  and 
Effect  Models  (Bittner,  1981 
Winer,  1971) 


Check  for  consistency  of  treatment 
effect  (i.e.,  D  -  (B  +  A) /2  =  K  for 
all  subjects?) 


CORRMAT  on  BDA  three-trial  cor 
relation  matrix 


Represent  treatment  effects  in  a 
single-subject  experiment 


Time-Series  intervention 
analysis  (Box  &  Tiao,  1975) 


Test  for  change  of  variance  of  a 
single  subject's  performance  a 
from  before  to  after  a  treatment 


Glass,  Gottman,  and  Wilson's 
(1975)  F-test 


Test  for  change  of  a  subject  s 
response  dynamics  from  before  to 
after  a  treatment 


Box  s  test  of  autocorrelation 
(Box  &  Jenkins,  1970) 


Account  for  autocorrelations  and 
biological  cycles  in  repeated- 
measures  data 


Box- Jenkins  (1970)  stochastic 
time-series  models 
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